Day1

Check the original database

BLASTn of mini reads subsets versus refseq viral database

make dir for output before the sbatch

sbatch job arrays

BLASTn of full FASTQ versus refseq viral database

using a one-liner with cut sort uniq sort piped in this order, can you give the list of the viral genomes hit by your BLASTn with the number of hits (sorted by increasing number of hits) ?

BIG question: is a BLASTn of all study reads versus nt reference DB achievable ?

Benchmarking to estimate cpu.hours

DAY2

DAY3

kraken2 + bracken sbatch on a single sample

Krona tool

load bracken data into SQL long table

multiQC add the kraken2 microbiome

DAY5

Build "long" and "wide" versions of the abundance matrix in R

R session

"long" version

image.png

SARS_CoV-2
taxon_name = Severe acute respiratory syndrome-related coronavirus
taxonomy_id = 694009

Check if there are any SARS-CoV-2 negative patient express SARS-CoV-2

Check is SARS-CoV-2 only ever seen in RNA samples (ie not in DNA samples)?

10 most ubiquitous human pathogens found in the 125 patients

"wild" version

image.png

Heatmap visualisation of the abundance matrix